Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predict new groups docs #734

Merged
merged 5 commits into from
Oct 10, 2023

Conversation

GStechschulte
Copy link
Collaborator

This PR adds docs for the new sample_new_groups arg. in model.predict() that was merged in PR #693. The notebook explains the motivation for the new arg. (related to hierarchical models) and how to use it to predict new groups either: directly with model.predict(), or with bmb.interpret.comparisons.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@GStechschulte
Copy link
Collaborator Author

GStechschulte commented Oct 7, 2023

The tests failing are related to PyMC issue #6941.

@review-notebook-app
Copy link

review-notebook-app bot commented Oct 10, 2023

View / edit / reply to this conversation on ReviewNB

tomicapretto commented on 2023-10-10T00:51:19Z
----------------------------------------------------------------

Thanks for the very clear explanation in the second paragraph :)


GStechschulte commented on 2023-10-10T04:55:52Z
----------------------------------------------------------------

Thank you! :)

@review-notebook-app
Copy link

review-notebook-app bot commented Oct 10, 2023

View / edit / reply to this conversation on ReviewNB

tomicapretto commented on 2023-10-10T00:51:19Z
----------------------------------------------------------------

I'm not familiar with LabelEnconder . Why do we use it here? If it just makes something categorical, you can always pass categorical=["patient", "smoking_status"] when you create the model instance. The reason I would like to avoid it, is to avoid the dependence on sklearn for the example.

Also, what is the reason to scale weeks and fvc? Is it to make things easier for the sampler?


GStechschulte commented on 2023-10-10T04:02:48Z
----------------------------------------------------------------

The patient ID's are long, e.g., ID00007637202177411956430 and I wanted it to be patient 1, 2, 3, etc. so it is "easier" to create a new patient. But yeah, we should avoid the dependence. I can create a label encoder function using numpy.

Yes, since weeks and fvc aren't on the same scale.

tomicapretto commented on 2023-10-10T11:09:30Z
----------------------------------------------------------------

Makes sense. I get it makes things easier later when you want to predict for particular individuals. Maybe this can be achieved by doing something like

series.map({series: np.arange(len(series))}.astype(object) 

@review-notebook-app
Copy link

review-notebook-app bot commented Oct 10, 2023

View / edit / reply to this conversation on ReviewNB

tomicapretto commented on 2023-10-10T00:51:20Z
----------------------------------------------------------------

I would add that we exclude the global intercept so smoking_status uses cell means encoding (i.e. the coefficient represents the mean of the group). And since we don't have a global intercept, we don't include a deflection around it, because it's simply not there. However, we do include a deflection around the weeks slope with weeks | patient.


GStechschulte commented on 2023-10-10T04:53:56Z
----------------------------------------------------------------

I am a bit confused on the "deflection" terminology. When you say "deflection" are you describing "variation"? For example,

However, we do include a deflection around the weeks slope with weeks | patient

Is like saying "the weeks slope is allowed to vary by individual patients"?

Edit: I just looked up deflection regarding statistical modelling:

"deflection" is often used to describe how coefficients (typically regression coefficients) deviate or vary from some reference point. It is a way to express how the effect of a predictor variable varies across different groups or levels of that variable.

tomicapretto commented on 2023-10-10T11:12:26Z
----------------------------------------------------------------

Is like saying "the weeks slope is allowed to vary by individual patients"?

Exactly. The slope for an individual "j" is "b_{week, j} = b_week + u_j". b_week is the common slope, while u_j is the deflection around that common slope for every individual j .

@review-notebook-app
Copy link

review-notebook-app bot commented Oct 10, 2023

View / edit / reply to this conversation on ReviewNB

tomicapretto commented on 2023-10-10T00:51:22Z
----------------------------------------------------------------

Do we need to use this very large number of tune steps? I would increase the number of draws because of the autocorrelation. Also, do we need init="auto"?


GStechschulte commented on 2023-10-10T04:09:22Z
----------------------------------------------------------------

Nope, and increasing draws reduces the autocorrelation, and nope :)

@review-notebook-app
Copy link

review-notebook-app bot commented Oct 10, 2023

View / edit / reply to this conversation on ReviewNB

tomicapretto commented on 2023-10-10T00:51:23Z
----------------------------------------------------------------

I would say something around the posteriors for week and weeks|patient , where we see that the slope can be very different for some individuals.


GStechschulte commented on 2023-10-10T04:20:07Z
----------------------------------------------------------------

Yup, good catch!

@tomicapretto
Copy link
Collaborator

@GStechschulte it's already in very good shape, just some minor comments.

Copy link
Collaborator Author

The patient ID's are long, e.g., ID00007637202177411956430 and I wanted it to be patient 1, 2, 3, etc. But yeah, we should avoid the dependence.

Yes, since weeks and fvc aren't on the same scale.


View entire conversation on ReviewNB

Copy link
Collaborator Author

Nope, and increasing draws reduces the autocorrelation, and nope :)


View entire conversation on ReviewNB

Copy link
Collaborator Author

Yup, good catch!


View entire conversation on ReviewNB

Copy link
Collaborator Author

I am a bit confused on the "deflection" terminology. When you say "deflection" are you describing "variation"? For example,

However, we do include a deflection around the weeks slope with weeks | patient

Is like saying "the weeks slope is allowed to vary by individual patients"?

Edit: I just looked up deflection regarding statistical modelling:

"deflection" is often used to describe how coefficients (typically regression coefficients) deviate or vary from some reference point. It is a way to express how the effect of a predictor variable varies across different groups or levels of that variable.


View entire conversation on ReviewNB

Copy link
Collaborator Author

Thank you! :)


View entire conversation on ReviewNB

@GStechschulte
Copy link
Collaborator Author

@GStechschulte it's already in very good shape, just some minor comments.

Thanks for the kind words and review! Much appreciated!

Copy link
Collaborator

tomicapretto commented Oct 10, 2023

Makes sense. I get it makes things easier later when you want to predict for particular individuals. Maybe this can be achieved by doing something like

series.map({series: np.arange(len(series))}.astype(object) 

Edit Just saw that you already modified it.


View entire conversation on ReviewNB

Copy link
Collaborator

Is like saying "the weeks slope is allowed to vary by individual patients"?

Exactly. The slope is "b_{week, j} = b_week + u_j". b_week is the common slope, while u_j is the deflection around that common slope for every individual j .


View entire conversation on ReviewNB

@tomicapretto
Copy link
Collaborator

@GStechschulte looks perfect, thanks!

@tomicapretto tomicapretto merged commit 3aaebca into bambinos:main Oct 10, 2023
1 of 4 checks passed
@GStechschulte GStechschulte deleted the predict-groups-examples branch January 21, 2024 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants